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1, INTRODUCTION 

Insider threat have been a critical threat source for an organization given their increased access and 
opportunity that can cause considerable damage to the organization. In comparison with outsiders, insiders 
have more privileged and legitimate access to information and facilities. Moreover, insiders are 
knowledgeable about an organization and its critical assets. With the additional knowledge of insiders, 
conducting an attack is easy for these insiders because they can hide their hacking trail/activities [1-3]. 
Surprisingly, 2018 insider threat reports have shown that 53% of threats come from within an organization in 
the last 12 months [4]. Moreover, 27% of surveyed organizations have stated that attacks originate from 
inside [4]. Thus, most organizations that implement cybersecurity techniques, such as intrusion detection, 
firewall, and electronic access system, aim to protect data not only from outside threats but also from insider 
threats [5]. In the last decades, many incidents of insider threats have gradually reached the media; for 
example, well-known cases of data leakage have been conducted by Edward Snowden, Daniel Ellsberg, 
and Chelsea Manning [6]. In contrast to the threats by outsiders, insider threats are easy to perform with no 
experience or advanced technical knowledge required given the authorization access that insiders have and 
the knowledge of the vulnerabilities of business processes and deployed systems. In comparison with 
outsiders whose hacking trails are hard to hide, malicious insiders are difficult to detect [6, 7]. 

Recurrent neural network (RNN) considers current value and previous input, thereby making this 
algorithm different from other neural networks. Therefore, RNN has been extensively used for solving the 
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problems of input order, such as issues on natural language processing [8, 9]. In the present study, 
we propose a conceptual method for insider threat detection on the basis of the behaviors of an insider. 
In addition, gated recurrent unit (GRU) neural network is explored further as a method for enhancing insider 
threat detection. Similar to long short-term memory (LSTM) approach, GRUs can capture long-term 
temporal dependencies on the sequence of user actions well considering the hidden units that GRU use to 
record temporal behavior patterns with simple-structure GRUs, thus saving additional computing resources 
and training times. 


2. RELATED WORK 

In recent years, cyber security has become a matter for societal, infrastructures and economic to 
every country in the world due to the tremendous number of electronic devices that are interconnected via 
networks communication [10-12, 13-15], one of the cyber security problems is the problem of insider threat 
that have been increased, thus attracting considerable attention from researchers in the field of insider threats 
[7]. Considerable studies have been conducted in this field. The works in [16, 17] proposed an approach to 
insider threat detection by applying the hidden Markov model (HMM). These studies used the HMM in 
modeling the normal behavior of users to detect any anomalous behaviors that may deviate from the norm. 
By using the HMM, the number of states has an enhanced impact on the effectiveness of a method. However, 
an increment in the number of states increases the computational cost of the HMM. 

Machine learning techniques have powerful capabilities of detection pereformance improvement 
[18-20] and self-adaptive abilities to handle the changes in the insider threat environment; these technologies 
are still affected by the impact of imbalanced data and lack of extensive knowledge on insider behavior 
patterns [21]. For example, to model the daily log time series, the work in [22] suggested one-class support 
vector machine (OCSVM), which conceptualizes the detection of an insider threat problem as a stream 
mining issue and demonstrates higher accuracy and lower false positives than traditional OCSVM. 

Recently, deep learning and RNN approaches have been applied in the field of insider threat 
detection, the proposed work in [23] utilized deep neural networks and RNNs to detect an insider; these 
neural networks are trained to recognize activities that are characteristic of every single user in the network 
and simultaneously assess whether the behavior of the user 1s normal or anomalous in real time. 

Similar to our proposed solution with a different technique, the work in [21 ]utilized the LSTM to 
model the insider activity log as a sequence of natural language, wherein the model extracts feature and 
detects anomalies when log patterns deviate from their trained models. The evaluation of the proposed model 
was based on a limited number of users, wherein eight users are randomly selected as a group from the 
experimental dataset. 


3. PROPOSED METHOD 

In this section, details of the proposed insider threat detection method will be discussed. 
This method utilizes the GRU to model an insider activity as a sequence that is similar to the natural 
language sequence. GRUs are selected in this study considering their simplicity and rapid training phase over 
the LSTM. 


3.1. Log Files 

All the activities performed by employees in an organization which are the events that come from 
many different sources, such as Logon event, device event, HTTP event, file event, system call, and email 
event logs. 


3.2. Data Processing 

During this stage all the operations will be collected and extracted for every user from multiple 
source files. All the operational data will then be organized into a sequence on the basis of the individual 
user’s daily actions. Similar to the modeling of the natural language, the action corresponds to the word, 
and the sequence of actions corresponds to a sentence. Thus, every user will have a list of actions performed 
on each day. Finally, when the log of a user data sequence of actions is inputted into the GRU classifier 
model, each process must be converted to a one-hot vector. Proposed method as shown in Figure 1. 
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Figure 1. Proposed method 


3.3. Anomaly Detection 

User activity is difficult to identify as a malicious insider action given the complexity of the insider 
threat problem. This section discusses an insider anomaly detection technique on the basis of the features of 
user action sequences for every user. GRU, a variant of the LSTM that was introduced in [24, 25], is the 
anomaly detection technique used in this study. This technique is similar to the LSTM but without the output 
gate. Therefore, GRUs fully write the contents from their memory cell to the large net at every time step. 
The internal structure of the GRU is simple, thereby accelerating training because few computations are 
required to update the hidden state of GRUs. Gated recurrent unit structure as shown in Figure 2. 
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Figure 2. Gated recurrent unit structure 
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3.4. Experiment 
In order to evaluate the performance of the proposed mothod, We run our proposed model on the 


public insider threat dataset CERT v4.2, the cert v4.2 has been chosen because this version contains more 
insider threats instances compared to the previous version of cert datasets. Over the period of 17 months 
32,770,227 log lines have been generated by 1000 users and among these logs there 7323 anomalous 
activities which was injected manually by the domain expert to represent the insider threats malicious 
scenarios that was described by CERT. Detailed data description as shon in Table 1. 


Table 1. Detailed Data Description from the Files of the r4.2 Dataset 
File Features 
ID 
DATE 
Logon USER 
PE 
ACTIVITY 
ID 
DATE 
USER 
PC 
FILENAME 
CONTENT 
ID 
DATE 
Device USER 
PC 
ACTIVITY 
ID 
DATE 
USER 
PC 
URL 
CONTENT 
ID 
DATE 
USER 


File 


HTTP 


Email Ce 


SIZE 
ATTACHMENT_C 
OUNT 
CONTENT 
EMPLOYEE_NAM 
E 
USER_ID 
Psychometri 
C 


Z2Arermmiaod 


EMPLOYEE NAM 
E 
USER_ID 
E-MAIL 
ROLE 
PROJECTS 
BUSINESS UNIT 
FUNCTIONAL_UN 
IT 
DEPARTMENT 
TEAM 
SUPERVISOR 


LDAP 


3.5. Initial Results 
This subsection is the last stage to testing whether an insider conduct can be considered a malicious 


act or not. The corresponding logs sequences are extracted and sent to the result model, which then outputs 
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the sequence’s classification result on the GRU. The evaluation of the proposed model shows that the model 
can successfully classify most of the insider with a good accuracy up to 0.92% when it was excuted with 20 
epochs and the loss value around 0.29. as shown in Figure 3. 
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Figure 3. Accurcy and loss result 


4. CONCLUSION 

Insider threat is a dangerous security threat. The issue of insider threats has become a primary 
concern for enterprises of all sizes. This study proposed a conceptual method for insider threat detection. 
The log file stages collect multiple logs from different sources. The preprocessing stage organizes the logs 
into a sequence on the basis of an individual user’s daily actions. Furthermore, GRU is suggested for insider 
threat detection. Finally, the result stage will output the results of the classification model. Therefore, 
the proposed method solution will help detect malicious behavior inside an organization. Future works will 
implement the proposed method on real and public datasets and combined with a modified adaptive synthetic 
oversampling technique (ADAS YN) algorithm to handle imbalanced of the insider threats datasets. 
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